FINAL  REPORT 


I 


FOUNDATIONS  OF  OBJECT  DETECTION  AND  RECOGNITION 

Brown  University  /  PI  Prof.  Ulf  Grenander  /  ARPA  Grant  No.  MDA972-93-1-0012 

August  20,  1998 


Start  Date:  1  July,  1993 

End  Date:  31  August,  1997 

Funding  Level:  $1,068,554  Total  4- Year  Funding  Prom  All  Sources 

(  $227,090  -  1  JUL  93  -  30  SEP  93;  $277,893  -  1  OCT  93  -  30  SEP  94;  $186,192  -  1  OCT 
94  -  30  SEPT  95;  $255,444  -  1  OCT  95  -  30  SEP  96;  $121,934  -  1  OCT  96  -  31  AUG  97) 

Principal  Participants  on  the  Project:  Ulf  Grenander  (Brown  University),  Donald 
Geman  (University  of  Massachusetts  and  Brown),  Stuart  Geman  (Brown),  Basilis  Gidas 
(Brown),  Donald  McClure  (Brown),  Chris  Raphael  (Brown),  E.  Bienenstock  (Brown  and 
University  of  Paris),  Anuj  Srivastava  (Brown  and  Florida  State  University). 

OBJECTIVES: 

1)  Develop  mathematical  foundations  for  a  unified  approach  to  object  (in  particular 
target)  detection  and  recognition. 

2)  Accommodate  multi-sensor  data  and  scenarios  with  large  numbers  of  objects. 

3)  Accommodate  rigid  and  non-rigid  transformations,  variations  in  lighting  conditions, 
contrast,  noise,  blur,  and  clutter. 

4)  Overcome  the  massive  computational  hurdles  inherent  in  the  general  problem  of  object 
detection  and  recognition. 

NOTABLE  ACCOMPLISHMENTS. 

1.  The  starting  point  when  building  a  mathematical  foundation  for  ATR  has  been  the 
assertion  that  in  order  to  be  able  to  see  and  understand  scenes  it  is  necessary  to  have  some 
prior  knowledge  about  the  scene  ensemble  that  is  expected  to  be  encountered.  We  construct 
such  priors  using  pattern  theoretic  ideas  and  try  to  catch  the  essential  characteristics  of  the 
scene  ensemble  through  the  introduction  of  a  configuration  space  C.  This  leads  to  prior 
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probability  measures  H  on  C.  It  should  be  noted,  however,  that  there  will  be  several  such 
measures,  each  one  corresponding  to  the  knowledge  we  happen  to  have  in  a  particular  situa¬ 
tion,  say  parametrized  through  a  parameter  vector  6  &  Q.  The  coordinates  of  the  parameter 
space  ©  can  express  for  example  meteorological  conditions  like  temperature,  lighting  condi¬ 
tions,  such  as  the  position  of  the  Sun,  or  tactical  conditions,  such  as  the  type  and  number  of 
vehicles  expected  in  the  scene  when  such  knowledge,  more  or  less  accurate,  has  been  made 
available  through  other  means.  In  this  way  the  probability  measure  will  be  conditioned  into 
some  He  on  C. 

In  this  approach  to  ATR  an  instrumental  idea  is  the  role  of  prior  conditions  made  explicit 
through  parameters  6.  One  could  express  this  by  saying  that  these  parameters  represent 
a  statistical  map  of  the  potential  scene  ensemble.  Typically  the  map  will  give  only  an 
incomplete  description  of  the  scene,  perhaps  that  it  is  of  type  ’desert’  without  specifying  the 
location  of  individual  sand  dunes,  or  that  it  can  contain  vehicles  of  a  certain  type  without 
specifying  their  location  and  orientation,  nor  their  number  n  >  0. 

The  potential  OOI’s  (Objects  Of  Interest)  are  represented  in  more  detail,  say  through 
CAD  models  -  templates  operated  on  by  low  dimensional  transformation  groups,  generically 
denoted  by  S.  The  group  could  be  for  example  the  Euclidean  group  in  the  plane,  SSDSE{2), 
for  totally  rigid  objects,  augmented  if  necessary  by  a  low  number  of  dimensions  if  rigidity  is 
not  total,  for  example  that  of  a  rotating  turret.  Or,  for  FLIR  sensors,  the  thermal  profile 
is  represented  by  a  low  dimensional  multiplicative  group  depending  upon  a  6  that  expresses 
temperature  conditions  and  recent  object  activity. 

In  dynamic  situations,  for  example  searching  for  aircraft  in  air  space,  the  probabilities 
on  the  transformation  groups  represent  what  is  known  about  the  dynamics  of  the  targets: 
mass,  moments  of  inertia,  limitations  on  thrust  and  torque,  etc.,  and  relate  them  to  the  de¬ 
velopment  of  trajectories  through  the  equations  of  Newtonian  mechanics.  We  have  explored 
this  possibility  and  constructed  priors  also  allowing  several  targets  in  the  scene. 

This  is  all  put  together  by  using  the  compositional  aspect  of  pattern  theory,  through 
which  we  combine  OOI’s  with  background,  and  the  transformational  aspect,  through  which 
we  modify  the  resulting  scenes  by  applications  of  transformations  from  S. 

The  prior  knowledge  has  now  been  represented  by  a  prior  probability  measure  lie  which 
is  then  conditioned  by  the  information  acquired  by  the  sensors,  a  mathematical  structure, 
the  deformed  image  7^,  which  is  typically  an  array,  not  necessarily  rectangular.  A  cross 
array  of  radars,  for  example,  would  have  an  output  consisting  of  complex  scalars  arranged  in 
a  cross  like  configuration.  To  formalize  this  we  often  write  7^  =  3Dn{T sc)  where  the  sensor 
transformation  T  takes  the  true  (but  unknown)  configuration  sc  €  C  into  an  array  Tsc,  and 
n(7)  means  a  noise  array  depending  upon  the  image  array  7.  In  simple  situations  the  noise 
can  be  additive  Gaussian,  in  quantum  limited  situations  it  can  be  Poisson,  and  so  on. 

We  have  not  tried  to  contribute  to  the  mathematics  of  sensor  technology  since  this  is  out¬ 
side  our  domain  of  expertise.  Instead  we  have  relied  upon  the  literature  to  choose  T,  n(-), .... 
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This  done  we  get  the  posterior  probability  measure  P0{dc\I'^)  on  C  conditioned  by  the  ob¬ 
served  image(s)  through  a  straight  forward  application  of  Bayes’  theorem;  this  posterior 
measure  contains  all  the  available  information. 

To  exploit  the  information  we  have  built  inference  engines  that  synthesize  the  posterior 
measure.  The  engine  solves  the  equations  of  a  jump-diffusion  process  recursively  and  the 
solution  has  been  used  for  target  recognition  and  detection  but  could  also  be  applied  to  get 
optimal  prediction  of  the  future  behavior  of  the  target(s).  This  set  up  allows  for  multiple 
sensors  as  well  as  for  multiple  targets, 

2.  The  methodology  described  above  has  been  implemented  for  the  following  scenarios: 

a)  One  or  several  rigid  objects  -  tanks,  APC...  -  observed  with  FLIR  using  CAD  models 
for  the  001  and  simulated  background. 

b)  A  flying  object  observed  with  a  combination  of  visible  light  camera  with  high  resolution 
radar,  synthesized  noise. 

c)  A  rigid  object  observed  with  a  visible  light  camera  with  stereographic  projection. 

3.  Within  this  framework  we  have  derived  metrics  for  ATR,  in  particular  Hilbert-Schmidt 
lower  bounds,  both  for  detect  ion/ recognition  error  probabilities  and  for  estimation  errors  in 
the  Euclidean  group  S  =  3DSE{2).  Since  this  group  has  curved  geometry  the  usual  linear- 
quadratic  metrics  are  not  suitable.  We  have  argued  that  non-convex  cost  functions  must 
be  used  and  that  ambiguities  in  inferences,  that  result  from  the  lack  of  convexity,  must  be 
explicitly  allowed  in  any  realistic  evaluation  of  performance  in  such  cases. 

The  analytically  derived  lower  bounds  have  been  compared  to  numerical  results  obtained 
by  Monte  Carlo  simulation  in  scenario  c);  the  results  were  approximately  the  same. 

4.  The  computational  feasibility  of  these  inference  engines  has  been  explored  and  saccadic 
search  algorithms  have  been  designed  to  speed  them  up.  This  part  of  our  work  is  still  in  a 
preliminary  form. 

5.  We  believe  that  an  important  component  in  our  approach  is  still  missing:  the  pattern- 
theoretic  representation  of  clutter.  As  mentioned  above  we  need  a  mathematical  description 
of  the  whole  scene,  and  this  includes  clutter,  in  order  to  make  optimal  inferences.  For  this 
reason  we  have  begun  to  study  clutter  systematically.  Earlier  it  was  difficult  to  get  access 
to  real  data  with  clutter  but  that  has  become  possible  via  some  image  data  bases.  We  have 
used  in  particular  the  MSTAR  data  base.  We  have  started  with  two  clutter  types: 

A)  Forest  type  clutter  of  clustering  trees 

and 

B)  Clutter  where  the  dominating  features  are  roads  in  a  landscape. 

In  both  cases  some  analytical  results  have  been  obtained  but  it  is  too  early  to  claim 


3 


success  as  far  as  realism  is  concerned.  This  work  is  continuing. 

6.  A  formal  framework  has  been  developed  for  object  modelling  and  image  interpretation 
based  upon  ideas  from  the  cognitive  sciences.  Collaboration  is  ongoing  with  neurophysiolo¬ 
gists  to  test  specific  predictions  about  patterns  of  activity  in  multi-unit  recordings. 

Related  to  this  is  the  development  of  a  theory  of  computational  anatomy  for  use  in 
medical  imaging,  in  particular  using  magnetic  resonance  cameras.  This  is  being  done  in  col¬ 
laboration  with  radiologists,  psychiatrists  and  neuroscientists  at  Washington  University  and 
Iowa  University.  Some  concrete  results  were  obtained  for  the  early  diagnosis  of  schizophrenia 
based  on  shape  changes  in  the  hippocampus. 

Both  of  these  research  activities  are  in  the  form  of  technology  transfer.  They  do  not  deal 
directly  with  ATR  but  are  based  on  mathematical  techniques  that  we  have  constructed  for 
ATR.  Also,  some  of  these  ideas  were  employed  for  the  detection  of  mines  in  shallow  water. 

7.  The  problem  of  reconstructing  surfaces  from  SAR  data  has  been  studied  in  the  context 
of  a  particular  SAR  application,  namely  the  reconstruction  of  data  collected  by  the  Magellan 
probe  of  Venus.  It  is  believed  that  this  mathematical  analysis  should  be  relevant  to  other 
uses  of  SAR. 

8.  To  disseminate  our  findings  to  other  researchers  we  have  of  course  used  the  usual 
method  of  publishing:  technical  reports,  papers  in  professional  journals,  and  talks  at  con¬ 
ference  proceedings.  In  addition  we  have  authored  two  CD-ROMS.  One  entitled  ’’Auto¬ 
mated  Target  Recognition;  A  Bayesian  Approach”  by  A.  Srivastava  and  U.  Grenander  con¬ 
tains  a  fairly  non-mathematical  presentation  of  our  ATR  work.  The  other,  called  ’’Evolv¬ 
ing  Anatomies”  by  U.  Grenander  and  L.  Matejic  discusses  our  approach  to  Computational 
Anatomy.  Both  CDs  have  been  distributed  widely  but  copies  are  still  available.  We  plan  to 
author  further  CDs  based  on  our  research  in  the  future. 

Meetings  have  been  organized  to  present  our  results.  One  was  a  workshop  on  ATR  and 
was  held  at  Brown  University  in  1996.  Another  dealt  with  Computational  Anatomy  was 
organized  at  Washington  University  in  1996. 

CONCLUSIONS.  The  task  of  developing  a  mathematical  foundation  for  a  unified  ap¬ 
proach  to  object  recognition  has  been  completed  to  some  extent.  We  believe  that  this  has  led  to 
a  better  understanding  of  what  is  really  needed  in  ATR,  theory  and,  partially,  to  how  this  can 
be  realized.  We  now  have  the  beginning  of  an  ATR  theory  and  hope  that  it  will  be  exploited 
by  interested  members  of  the  ATR  community. 
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